AITopics

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsOct-9-2025, 12:24:27 GMT

f9ad87c1ebbae8a3555adb31dbcacf44-Paper-Datasets_and_Benchmarks.pdf

machine learning, natural language, sampler, (17 more...)

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
(2 more...)

Neural Information Processing SystemsAug-16-2025, 08:47:29 GMT

c6447300d99fdbf4f3f7966295b8b5be-AuthorFeedback.pdf

assumption, step time, synchronization, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsAug-15-2025, 11:45:04 GMT

9f29450d2eb58feb555078bdefe28aa5-Supplemental.pdf

opération, step time, workload, (16 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.71)

Min, Yimeng, Gomes, Carla P.

Unsupervised Ordering for Maximum Clique

arXiv.org Artificial IntelligenceMar-25-2025

We propose an unsupervised approach for learning vertex orderings for the maximum clique problem by framing it within a permutation-based framework. We transform the combinatorial constraints into geometric relationships such that the ordering of vertices aligns with the clique structures. By integrating this clique-oriented ordering into branch-and-bound search, we improve search efficiency and reduce the number of computational steps. Our results demonstrate how unsupervised learning of vertex ordering can enhance search efficiency across diverse graph instances. We further study the generalization across different sizes.

artificial intelligence, machine learning, vertex, (15 more...)

2503.21814

Country: North America > United States > New York > Tompkins County > Ithaca (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJun-28-2024

Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training

Du, Xianzhi, Gunter, Tom, Kong, Xiang, Lee, Mark, Wang, Zirui, Zhang, Aonan, Du, Nan, Pang, Ruoming

Mixture-of-Experts (MoE) enjoys performance gain by increasing model capacity while keeping computation cost constant. When comparing MoE to dense models, prior work typically adopt the following setting: 1) use FLOPs or activated parameters as a measure of model complexity; 2) train all models to the same number of tokens. We argue that this setting favors MoE as FLOPs and activated parameters do not accurately measure the communication overhead in sparse layers, leading to a larger actual training budget for MoE. In this work, we revisit the settings by adopting step time as a more accurate measure of model complexity, and by determining the total compute budget under the Chinchilla compute-optimal settings. To efficiently run MoE on modern accelerators, we adopt a 3D sharding method that keeps the dense-to-MoE step time increase within a healthy range. We evaluate MoE and dense LLMs on a set of nine 0-shot and two 1-shot English tasks, as well as MMLU 5-shot and GSM8K 8-shot across three model scales at 6.4B, 12.6B, and 29.6B. Experimental results show that even under these settings, MoE consistently outperform dense LLMs on the speed-accuracy trade-off curve with meaningful gaps. Our full model implementation and sharding strategy has been released at https://github.com/apple/axlearn.

coreen, moe, step time, (15 more...)

2405.15052

Country:

Asia > Middle East > Jordan (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report > New Finding (0.49)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceApr-17-2024

ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours

Zhu, Feiwen, Nowaczynski, Arkadiusz, Li, Rundong, Xin, Jie, Song, Yifei, Marcinkiewicz, Michal, Eryilmaz, Sukru Burc, Yang, Jun, Andersch, Michael

AlphaFold2 has been hailed as a breakthrough in protein folding. It can rapidly predict protein structures with lab-grade accuracy. However, its implementation does not include the necessary training code. OpenFold is the first trainable public reimplementation of AlphaFold. AlphaFold training procedure is prohibitively time-consuming, and gets diminishing benefits from scaling to more compute resources. In this work, we conducted a comprehensive analysis on the AlphaFold training procedure based on Openfold, identified that inefficient communications and overhead-dominated computations were the key factors that prevented the AlphaFold training from effective scaling. We introduced ScaleFold, a systematic training method that incorporated optimizations specifically for these factors. ScaleFold successfully scaled the AlphaFold training to 2080 NVIDIA H100 GPUs with high resource utilization. In the MLPerf HPC v3.0 benchmark, ScaleFold finished the OpenFold benchmark in 7.51 minutes, shown over $6\times$ speedup than the baseline. For training the AlphaFold model from scratch, ScaleFold completed the pretraining in 10 hours, a significant improvement over the seven days required by the original AlphaFold pretraining baseline.

alphafold training, kernel, scalefold, (13 more...)

2404.11068

Country:

Asia > China > Shanghai > Shanghai (0.05)
Europe > Poland > Masovia Province > Warsaw (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Kim, Myeong-Ju, Lim, Daegyu, Park, Gyeongjae, Park, Jaeheung

A Model Predictive Capture Point Control Framework for Robust Humanoid Balancing via Ankle, Hip, and Stepping Strategies

arXiv.org Artificial IntelligenceJul-25-2023

The robust balancing capability of humanoid robots against disturbances has been considered as one of the crucial requirements for their practical mobility in real-world environments. In particular, many studies have been devoted to the efficient implementation of the three balance strategies, inspired by human balance strategies involving ankle, hip, and stepping strategies, to endow humanoid robots with human-level balancing capability. In this paper, a robust balance control framework for humanoid robots is proposed. Firstly, a novel Model Predictive Control (MPC) framework is proposed for Capture Point (CP) tracking control, enabling the integration of ankle, hip, and stepping strategies within a single framework. Additionally, a variable weighting method is introduced that adjusts the weighting parameters of the Centroidal Angular Momentum (CAM) damping control over the time horizon of MPC to improve the balancing performance. Secondly, a hierarchical structure of the MPC and a stepping controller was proposed, allowing for the step time optimization. The robust balancing performance of the proposed method is validated through extensive simulations and real robot experiments. Furthermore, a superior balancing performance is demonstrated, particularly in the presence of disturbances, compared to a state-of-the-art Quadratic Programming (QP)-based CP controller that employs the ankle, hip, and stepping strategies. The supplementary video is available at https://youtu.be/CrD75UbYzdc

artificial intelligence, optimization problem, robot, (17 more...)

2307.13243

Country: Asia (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.46)

arXiv.org Artificial IntelligenceJan-20-2023

Baechi: Fast Device Placement of Machine Learning Graphs

Jeon, Beomyeol, Cai, Linda, Shetty, Chirag, Srivastava, Pallavi, Jiang, Jintao, Ke, Xiaolan, Meng, Yitao, Xie, Cong, Gupta, Indranil

Machine Learning graphs (or models) can be challenging or impossible to train when either devices have limited memory, or models are large. To split the model across devices, learning-based approaches are still popular. While these result in model placements that train fast on data (i.e., low step times), learning-based model-parallelism is time-consuming, taking many hours or days to create a placement plan of operators on devices. We present the Baechi system, the first to adopt an algorithmic approach to the placement problem for running machine learning training graphs on small clusters of memory-constrained devices. We integrate our implementation of Baechi into two popular open-source learning frameworks: TensorFlow and PyTorch. Our experimental results using GPUs show that: (i) Baechi generates placement plans 654 X - 206K X faster than state-of-the-art learning-based approaches, and (ii) Baechi-placed model's step (training) time is comparable to expert placements in PyTorch, and only up to 6.2% worse than expert placements in TensorFlow. We prove mathematically that our two algorithms are within a constant factor of the optimal. Our work shows that compared to learning-based approaches, algorithmic approaches can face different challenges for adaptation to Machine learning systems, but also they offer proven bounds, and significant performance benefits.

artificial intelligence, machine learning, natural language, (20 more...)